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We consider the following problem. Let us fix a finite alphabet s$ — {1,2, ••• ,d}; for any li-uple 
of letter frequencies (/i , • • • ,fd) € [0, l] d with Yf!=\ fi — 1> how to construct an infinite word u over 
the alphabet srf satisfying the following conditions: u has linear complexity function, u is uniformly 
balanced, the letter frequencies in u are given by (/i, • • • ,f/). This paper investigates a construction 
method for such words based on the use of mixed multidimensional continued fraction algorithms. 

Keywords: balanced words, discrepancy, letter frequency, multidimensional continued fractions, 
discrete geometry 

1 Introduction 

We consider the following problem: let us fix a finite alphabet srf = { 1 , 2, • • • , d}; for any J-uple of letter 
frequencies (/i , • • • ,fd) € [0, Vf with Y,f=i fi = 1» how to construct an infinite word u over the alphabet 
srf satisfying the following conditions: 

1 . u has linear complexity function 

2. u is uniformly balanced 

3. the letter frequencies in u are given by (/i, • • • ,/</). 

Let us first recall several definitions in order to clarify the previous statement. A word u G ^ N is said to 
be uniformly balanced if there exists a constant C > such that for any pair of factors of the same length 
v, w of u, and for any letter i £ s$ ', 



where the notation \x\j stands for the number of occurrences of the letter j in the factor x. A word u has 
linear complexity function if there exists a constant C > such that the number of factors of u of length 
n is smaller than C • n, for every positive integer n. The. frequency f of a letter i € in u = (« m )«gN 
is defined as the limit (when n tends towards infinity), if it exists, of the number of occurrences of i in 
uqU\ . . . u n -\ divided by n. 
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This problem has several motivations. The first one comes from discrete geometry: such an infinite 
word can be seen as a coding of a discrete line in T, d . Indeed one associates with any infinite word over 
the alphabet stf a broken line obtained as a stair made of a union of segments of unit length directed 
according to the coordinate axes, whose vertices are obtained by replacing each of the letters of u by one 
of the canonical basis vectors and by concatenating these vectors. Letl : A* — > N", w \- > '(|w| fll , . . . , |w| Qn ) 
stand for the abelianisation map or the Parikh mapping. More precisely, the set of vertices of this broken 
line is equal to {l(«o ■ ■ • «at-i) I N € N}. The question is to know how well the line associated with the 
word u approximates the Euclidean line directed by the vector of letter frequencies of u, when they exist. 
There exist various strategies for defining and generating discrete lines in the three-dimensional space. 
With no claim for being exhaustive, let us quote e.g. El |9l [I4l |23l. Nevertheless, they do not fulfill 
Condition 1 . on the linear complexity. Note that the notion of discrete line defined in (2J corresponds 
to billiard words. Condition 1. means here that these discrete lines are "simple" in terms of number of 
local configurations. 

The second motivation comes from symbolic dynamical systems and Diophantine approximation: is 
it possible to define a Rauzy fractal associated with any translation of the torus? More precisely, assume 
we are given a translation x h > x + {a,\ , • • • a c i) defined on T d = U. d /Z d ; the Rauzy fractal M associated 
with an infinite word u over the ^letter alphabet srf is defined by projecting along the frequency vector 
of u on a transverse hyperplane the vertices of the broken line associated with u (such as described above) 
and then, by taking the closure. For more on Rauzy fractals, see e.g. |8]. The problem now becomes 
the following: is it possible to construct an infinite word u over the J-letter alphabet stf such that St is a 
compact set that tiles periodically this transverse hyperplane and such that u has linear complexity? Let 
us explain in this context the requirement concerning linear complexity (Condition 1.): we would like to 
recover from the dynamical and combinatorial properties of the infinite word u arithmetical information 
on the parameters underlying the translation on the torus. This will be easier if u has low complexity 
function, i.e., a low numbers of factors. Let us quote as a further motivation uniform distribution and the 
so-called chairman assignment problem, see e.g. [22], and the references therein. 

There exist families of words that satisfy Conditions 2. and 3. but not Condition 1. Billiard words 
are defined as codings of trajectories of billiards in a cube; they are shown to have quadratic complexity 
(see El|6l). They satisfy Conditions 2. and 3. Let us also quote the construction described in [ 12] which 
produces step by step a broken line whose vertices belong to 1? that approximates a given direction by 
choosing at each step the closest point. It is proved in |[T2l that such a broken line can be obtained by 
selecting integer points by shifting a polygonal window along the line. The complexity is here again 
quadratic. The corresponding infinite words satisfy Conditions 2. and 3. Note that 1-balanced words 
over a higher-alphabet do not seem to be good candidates for describing discrete segments in the space: 
not all frequencies can be reached. Fraenkel's conjecture states that the possible frequencies for 1- 
balanced words are rational and uniquely determined, when they are assumed to be distinct lfT6l . In 
particular, when k = 3, the only possible 1-balanced word is (1213121)°° (if frequencies are distinct), up 
to a permutation of letters and up to shifts. For the irrational case, see ifTHl and ifTTl . For more references 
on the subject, see also the survey |[24l . Note also that Arnoux-Rauzy words (see e.g. (5] HU [TO)) are 
infinite words that do not satisfy Condition 2., such as proved in ifTTTi . but that do satisfy Conditions 1. 
and 3. Furthermore, they are not defined for every J-uple of letter frequencies, but only for a set of zero 
measure in [0, \] d . For an illustration, see Figured 
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2 Multidimensional continued fractions and frequencies 

The strategy we consider here for constructing infinite words satisfying the three above mentioned con- 
ditions consists in applying a multidimensional continued fraction algorithm to the frequency vector 
(fir" ,fd), according to Q. We then associate with the steps of the algorithm substitutions, that is, 
rules that replace letters by words, with these substitutions having the matrices produced by the contin- 
ued fraction algorithm as incidence matrices. More precisely, a substitution a over the alphabet stf is 
an endomorphism of the free monoid £/*, and the incidence matrix of the substitution a is the square 
matrix M a with entries mu = \ o{j)\i for all i,j G si 1 . 

Let us recall the most classical multidimensional continued fraction algorithms such as described e.g. 
in 11201 . and in ifTTl [TOl l25l for Arnoux-Rauzy algorithm. For the sake of simplicity, we express them in 
dimension d = 3: 

• Jacobi-Perron: let < u\,U2 < u$, 

, , , M2-, r U$. . 

(ai,M2,«3j ^ ("2- [ — J«l,«3 - [ — J«l,«lJ- 
U\ U\ 

• Brun: we subtract the second largest entry from the largest one; for instance, if < u\ < U2 < «3, 

(ui,U2,m) !->■ (u\,U2,U3 -ui). 

• Poincare: we subtract the second largest entry to the largest one, and the smallest entry from the 
second largest one; for instance, if < u\ < U2 < uj, 

(ui,U2,Ul) H-> (ui,U2 — «1,M3 — Uz). 

• Selmer: we subtract the smallest positive entry from the largest one; for instance, if < u\ <U2< 

"3, 

(UI,U2,U^) H-» (ui,U2,Uj —U\). 

• Fully subtractive: we subtract the smallest positive entry from all the largest ones; for instance, if 

< U\ < U2 < «3, 

(ui,U2,Ui) I—)- (ui,U2 — Ul,U3 — U\). 

• Arnoux-Rauzy: let < u\ < U2 < «3 with uj, > u\ +U2, 

(ui,U2,U3) !-)■ (lli,U2,U3 — u\ — uz)- 

otherwise the algorithm stops. 

Let T be one of these algorithms applied to some vector {f\,fi-,h) £ [0, l] 3 - With each matrix M pro- 
duced by T, we associate a substitution whose incidence matrix is given by M. We thus obtain a word 
by iterating these substitutions in an 5-adic way. We recall that a word is said to be S-adic if it is gen- 
erated by composing a finite number of substitutions. This covers various families of words with a rich 
dynamical behavior such as Sturmian sequences; for more on S-adic words, see e.g. |[3l[T3l. 
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3 Fusion algorithms 

We can also mix these algorithms by performing at each step one among these rules, and this still yields 
5-adic generated words. We call such algorithms fusion algorithms. We focus on fusion algorithms ob- 
tained by applying Arnoux-Rauzy algorithm when possible, and otherwise, consistently one algorithm 
among Brun, Poincare, Selmer, or the Fully Subtractive algorithms. Indeed, experimental studies indicate 
that a combination of Arnoux-Rauzy steps with Brun steps, or with Poincare steps produces good perfor- 
mances (see Tableland Figure [5]below), and even better performances than when performing only one 
algorithm. Furthermore, this allows us to exploit and extend the good mean behaviour of Arnoux-Rauzy 
algorithm to a larger set of parameters (compare Figure [4] and Figure [5]). 

The aim of this lecture is to study the properties of such fusion algorithms for both finite (rational 
frequencies) and infinite expansions (irrational frequencies). In particular, we will focus on the almost 
everywhere convergence properties and ergodic properties of these fusion algorithms when the frequency 
vector has irrational coordinates. The proof relies on classic techniques such as described e.g. in [20]. 





Minimum 


Mean 


Maximum 


Std 


Arnoux-Rauzy 


0.6000 


0.9055 


1.200 


0.1006 


Fully subtractive 


0.6000 


5.982 


13.92 


4.388 


Fully subtractive as possible 


0.6000 


4.172 


25.00 


4.440 


Selmer 


0.5000 


2.184 


12.75 


2.070 


Brun 


0.5000 


1.114 


2.000 


0.2664 


Brun Multiplicative 


0.6000 


1.117 


2.000 


0.2681 


Poincare 


0.6000 


2.527 


11.13 


2.261 


Jacobi-Perron 


0.6000 


2.731 


25.00 


3.456 


Random reduction 


0.5000 


2.426 


24.99 


2.779 


Fusion of Arnoux-Rauzy and Fully subtractive 


0.6000 


1.095 


2.800 


0.3105 


Fusion of Arnoux-Rauzy and Selmer 


0.6000 


0.9678 


1.450 


0.1438 


Fusion of Arnoux-Rauzy and Brun Multiplicative 


0.6000 


0.9132 


1.400 


0.1143 


Fusion of Arnoux-Rauzy and Poincare 


0.6000 


0.8941 


1.200 


0.09733 



Table 1 : Statistics (minimum, mean, maximum, standard deviation) for the discrepancy for triplets of 
nonnegative rational vectors (ai/N,a2/N,aj/N) suchthatai +<22 + a , 3 =N withN = 100. 

Consider now the case of rational frequencies. Table [T] displays some experimental results. We 
work here in dimension d = 3 with rational frequency vectors of the form f = (a\/N,a2/N,a^/N), with 
a; G N, i = 1,2,3, and with a\ +a2 + «3 = N being a positive integer. We apply a fusion algorithm to 
such a triplet, until we reach a vector whose entries are all equal to but one. This produces a finite 
sequence of matrices, and thus, of substitutions, having these matrices as incidence matrices. Note that 
we have several choices for these substitutions, even if the incidence matrices have entries in {0,1}. 
Given a matrix M, we thus have to decide in which order letters will be chosen in the image of a letter 
by a substitution a having M as incidence matrix. We choose as a convention to put the most frequent 
letter first. (This (partly) explains why the triangles obtained in Figure [T] [2j |3j |4| [5] are not perfectly 
symmetric.) Let us apply now to f a finite sequence of steps of a fusion algorithm together with a choice 
of substitutions associated with the produced matrices. One has f = M\ ■ ■ -M n f n , where the vector f„ has 
two coordinates equal to 0, and one non-zero coordinate of index, say w n S {1,2,3}. The associated 
substitutions are denoted by for 1 < k < n. The following diagram illustrates how we produce finite 
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words w with frequency vector f: 

Mi M2 M3 M n 
f = f < fl < h < < fn 

a < a - ff3 a " r , . „ 

w = Wo < wi < W2 < • • • < w n G {1,2,3} 

The experimental results of Table[T]indicate that the fusion algorithm obtained when applying Arnoux- 
Rauzy algorithm when possible, and otherwise, Poincare algorithm, behaves in an efficient way with 
respect to the discrepancy. The discrepancy of a finite word uq ■ ■ ■ u n € £f" +1 is defined as 

max \fi-k-\u ---u k \i\. 

iejrf, 0<k<n 

This distance is considered e.g. in ||22| and (TJ, and is intimately connected with the following balance 
measure. The balance of uq ■ ■ ■ u„ £ is defined as 

max ||v|,- — |w|,-|, 

i€-s^, |v| = |w| 

(here v,w are factors of u of the same length |v| = We have chosen here to use the discrepancy for 
our numerical experiments in order to compare our results with the bound discussed in ll22ll . Indeed, in 
ll22ll . an algorithm is given that produces, for any given frequency vector (/1, ■ ■ • ,fd), an infinite word 
whose discrepancy is smaller than or equal to 1 — 1/ (2d — 2) (this yields 3/4 for d = 3). However, the 
lowest possible asymptotic order for the factor complexity of such a word does not seem to be known; 
nothing seems a priori to prevent it from being linear. In the fusion algorithm obtained by combining 
Arnoux-Rauzy algorithm with Poincare algorithm, one obtains a mean discrepancy equal to 0.8910 when 
Af = 100. More generally, Figure[T][2][3}|4][5]below illustrate the behaviour of the discrepancy for triplets 
of nonnegative rational vectors (a\ /N, a2/N, as /N) such that a\ + a2 + «3 = for a given N. 



min 


0.6000 


mean 


1.484 


max 


3.000 


std 


0.6137 



(18,1,1) 



(1,1, li 

( ) 



1 



(1,18,1) 



3 

2.5 
2 

1.5 
1 



Figure 1 : Discrepancy for triplets of nonnegative rational vectors (a\/N, a2/N, as /N) such that a\ + «2 + 
as = N with N = 20 using Poincare algorithm. 
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min 


0.6000 


mean 


5.982 


max 


13.92 


std 


4.388 




(98,i, i; 



12 
10 



,1 



u 



Figure 2: Discrepancy for triplets of nonnegative rational vectors {a i/N, a2/N,a^/N) such that a\ +«2 + 
cii = N with ,/V = 100 using Fully subtractive algorithm. 




Figure 3: Discrepancy for triplets of nonnegative rational vectors (ai/N,a2/N,a^/N) such that a\ +a2 + 
ai = N with ,/V = 100 using Poincare algorithm. 
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(1,1,98) 



min 


0.6000 


mean 


0.9055 


max 


1.200 


std 


0.1006 




1.2 



(98,1,1) 



:i. 



0.8 



0.6 



Figure 4: Discrepancy for triplets of nonnegative rational vectors (a\/N,a2/N,a^/N) such that a\ +«2 + 
cii = N with N = 100 using Arnoux-Rauzy algorithm. This algorithm is defined only for vectors whose 
largest entry is greater than or equal to the sum of the other two. 



:i,l,98) 



min 


0.6000 


mean 


0.8941 


max 


1.200 


std 


0.09733 




1.2 



(98,1,1) 



(1,98,1) 



0.8 



0.6 



Figure 5: Discrepancy for triplets of nonnegative rational vectors {a\/N,a.2/N,a-s/N) such that a\ +ci2 + 
ai = N with N = 100 using a fusion of Arnoux-Rauzy and Poincare algorithms. 
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